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ABSTRACT 

In this paper, we propose a method to address the problem of source 
estimation for Sparse Component Analysis (SCA) in the presence of 
additive noise. Our method is a generalization of a recently proposed 
method (SLO), which has the advantage of directly minimizing the 
/'-norm instead of ^-norm, while being very fast. SLO is based on 
minimization of the smoothed ^°-norm subject to As = x. In order 
to better estimate the source vector for noisy mixtures, we suggest 
then to remove the constraint As = x, by relaxing exact equality to 
an approximation (we call our method Smoothed ^°-norm Denois- 
ing or SLODN). The final result can then be obtained by minimiza- 
tion of a proper linear combination of the smoothed ^°-norm and a 
cost function for the approximation. Experimental results empha- 
size on the significant enhancement of the modified method in noisy 
cases. 

Index Terms — atomic decomposition, sparse decomposition, 
sparse representation, over-complete signal representation, sparse 
source separation 

1. INTRODUCTION 

Blind source separation (BSS) consists of detecting the underlying 
source signals within some observed mixtures of them without any 
prior information about the sources or the mixing system. Let x £ 
R n be the vector of observed mixtures and s £ R m denote the vec- 
tor of unknown source signals. The mixing equation for the linear 
instantaneous noisy model will be: 

x = As + n (1) 

where A is the n x m unknown mixing matrix and n denotes the 
additive noise vector. The aim of BSS is then to estimate s from 
observed data x without any knowledge of the mixing matrix, A, or 
the source signals. 

In the determined case, when n > m, the problem can be suc- 
cessfully solved using Independent Component Analysis (ICA) [ 1 1. 
However, in the underdetermined (or over-complete) cases where 
fewer observations than sources are provided, even if A is known, 
there are infinitely many solutions to the problem since the number 
of unknowns exceeds the number of equations. This ill-posedness 
could be resolved by the assumption of 'Sparsity', i.e. resulting in 
non totally blind source separation problem. A signal is considered 
to be sparse when only a few of its samples take significant values. 
Thus, among all possible solutions of |Q} we seek the sparsest one, 
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which has then minimum number of nonzero components, i.e. min- 
imum ^°-norm. 

SCA can also be viewed as the problem of representing a signal 
x £ K™ as a linear combination of m vectors, called atoms |2). The 
atoms {ip i }YL 1 collectively form a dictionary, n x m matrix, over 
which the signal is to be decomposed.There are special interests in 
the cases where m > n (refer for example to |3| and the references 
in it). Again we have the problem of finding the sparsest solution of 
the set of underdetermined linear equations x = X/Iii s »¥>i where 
<1? = [ipi, • • • , <p m ] is the dictionary of m atoms. This problem is 
also called 'atomic decomposition' and has many potential applica- 
tions in diverse fields of science (3). 

The general Sparse Component Analysis (SCA) problem con- 
sists of two steps: first estimating the mixing matrix, and then finding 
the sparsest source vector, assuming the mixing matrix to be known. 
The first step can be accomplished by means of clustering meth- 
ods |4|. In this paper, we focus our attention on the second step; 
that is for a given mixing matrix, we wish to find the solution to the 
following minimization problem: 

s = argmin||s||o subject to x = As (2) 

where ||s||o denotes the number of non-zero elements of s (and is 
usually called the ^°-norm of s). 

So far, several algorithms such as Basis Pursuit (BP) |5, 6| and 
Matching Pursuit (MP) (2||4] have been proposed to approximate 
the solution of I0. The former is based on the observation that for 
most large underdetermined systems of linear equations the mini- 
mal £ -norm (J^\ |s» | ) solution is also the sparsest solution |6|. 
The minimization of ^-norm can be efficiently solved using Linear 
Programming (LP) techniques |7 |. Despite all recent developments, 
computational efficiency has still remained as a main concern. 

Recently in | 8 ], the idea of using smoothed £°-norm (SLO) 
was introduced. More precisely this algorithm minimizes a smooth 
approximation of the ^°-norm denoted by m — F a (s), and the ap- 
proximation tends to equality when a — > 0. The algorithm then 
sequentially solves the problem: 

maximize F a (&) s.t. As = x (3) 

for a decreasing sequence of o. 

This approximation accommodates for both continuous optimiza- 
tion techniques to estimate the sparsest solution of {2]l and a noise- 
tolerant algorithm. The idea turned out to be both efficient and ac- 
curate, i.e. providing a better accuracy than £ 1 -norm minimization 
algorithms while being about two orders of magnitude faster 1 8 1 than 
LP. 

However, the proposed algorithm has not been designed for the 
noisy case Q}, where a noise vector, n, has been added to the ob- 
served mixture x. In this paper, we will try to generalize the proposed 



method to this noisy case by removing the As = x constraint and 
relaxing the exact equality to an approximation. In sparse decompo- 
sition viewpoint, this means an approximate sparse decomposition 
of a signal on an over-complete dictionary. The final algorithm will 
then be an iterative minimization of a proper linear combination of 
smoothed ^°-norm and || As — x|||. 

This paper is organized as follows. Section 2 discusses the main 
idea of the proposed method. Section 3 gives a formal statement 
of the final algorithm. Finally, experimental results are presented in 
Section 4. 

2. MAIN IDEA 

As stated in the previous section, when the dimensions increase, 
finding the minimum ^°-norm solution of {2]l is impractical for two 
reasons. Firstly because ^°-norm of a vector is a discontinuous func- 
tion of its elements and leads to an intractable combinatorial opti- 
mization, and secondly because of the solution being highly sen- 
sitive to noise. The idea of |8| is then to replace the ^°-norm by 
continuous function, which approximates Kronecker delta function, 
and use optimization techniques to minimize it subject to As = x, 
as a constraint. For example, consider the Gaussian like function: 

m 

F a (s) =]TexpM/2<7 2 ) ( 4 ) 

i—l 

where Si denotes the i-th element of vector s. For sufficiently small 
values of a, F a (s) tends to count the number of zero elements of the 
vector s. Thus we have: 

||s|| = m- limF CT (s) (5) 

where m is the dimension of the vector s. The sparsest solution 
of l(2j can then be approximated by the solution of the following 
minimization problem: 

s = argmin (m — F CT (s)) subject to x = As (6) 

The above minimization task can be accomplished using common 
gradient type (e.g. steepest descent) algorithms. Note that the value 
of a determines how smooth the function F a is; the smaller the value 
of a, the better the estimation of 1 1 s 1 1 o but the larger the probability of 
being trapped in local minima of the cost function. The idea of (8) 
for escaping from local minima is then to use a decreasing set of 
values for a in each iteration. More precisely for each value of a the 
minimization algorithm is initiated with the minimizer of the F a (s) 
for the previous (larger) value of a. 

Now consider a more realistic case where a noise vector, n, 
has been added to the observed mixture, as in (TJl. Here we no- 
tice that we have an uncertainty on exact value of the observed vec- 
tor and it seems reasonable to remove the x = As constraint and 
reduce it to x » As. This idea is based on the observation that 
in presence of considerable noise, this constraint may lead to a to- 
tally different sparse decomposition. Thus we wish to minimize two 
terms; ||As — x|| 2 as cost of approximation, and the smoothed £°- 
norm (m — F a (s)), as the measure of sparsity. 

For the sake of simplicity, we choose ||As — x||| as the cost 
of approximation. Therefore, the idea will naturally leads us to the 
following minimization problem: 

s = argmin J CT (s) = (m — F a (s)) + A ||As — x||| (7) 

where A > 0, represents a compromise between the two terms of 
our cost function; sparsity and equality condition. Intuitively, we 



may expect that for less noisy mixtures, the value of A should be 
greater than that of observations with high noise quantity. Further 
discussion on the choice of A is left to Section|4] 

Another advantage of removing x = As constraint appears 
when the dictionary matrix, A, is not full rank. In this case satis- 
fying the exact equality constraint for observed vectors, which are 
not in column space of A is impossible and as a result the previous 
algorithm fails to find any answer. 

3. FINAL ALGORITHM 

The final algorithm is shown in FigQ] We call our algorithm SLO 
DeNoising (SLODN). As seen in the algorithm, the final values of the 
previous estimation are used for the initialization of the next steepest 
descent step. The decreasing sequence of a is used to escape from 
getting trapped into local minima. 
Direct calculations show that: 

As=^M=A(2A T (As-x)) 



In the minimization part, the steepest descent with variable step- 
size (fi) has been applied: If fj, is such that J a (s — /xAs) < J CT (s) 
we multiply it by 1.2 for the next iteration, otherwise it is multiplied 
by 0.5. 

4. EXPERIMENTAL RESULTS 

In this section we investigate the performance of the proposed method 
and present our simulation results. Since our framework is a gener- 
alization of the idea presented in |8 |, the practical considerations in 
that paper can be directly imported into our framework. 

In (8), it has been experimentally shown that SLO is about two 
orders of magnitude faster than the state-of-the-art interior-point LP 
solvers |7|, while being more accurate. We provide the comparison 
results of our method with the SLO method. Moreover a comparison 
with Basis Pursuit Denoising will be presented. 

In all experiments, sparse sources have been artificially gener- 
ated using a Bernoulli-Gaussian model: each source is 'active' with 
probability p, and is 'inactive' with probability 1 — p. If it is active, 
its value is modeled by a zero-mean Gaussian random variable with 
variance a^ n ; if it is not active, its value is modeled by a zero-mean 
Gaussian random variable with variance where cr^g <C 0"on- 
Consequently, each Si is distributed as: 

Si ~ p ■ JV(0, a on ) + (1 - p) ■ M(Q, o oB ), (9) 

Sparsity implies that p< 1. We considered p — 0.1, <r g =0.01 
and cr on = 1. Elements of the mixing matrix, A, and noise vector, 
n, were also considered to have normal distributions with standard 
deviation of 1 and <r n , respectively. As in (§J, the set of decreasing 
values for a was fixed to [1, 0.5, 0.2, 0.1, 0.05, 0.02, 0.01]. 

Experiment 1. Optimal value of A 

In this experiment, we investigate the effect of A on the perfor- 
mance of our method. We set the dimensions to m — 1000, n = 
400, and for each value of <r n = 0, 0.01, . . . , 0.15 we plotted the av- 

11 11 2 

erage Signal to Noise Ratio (SNR), defined by 101og 10 m^ JLis . as 
a function of A (in this section, all the results are averaged over 100 
experiments). Figure [2] shows a sample of our experiments. Dash 
line represents the results obtained from l(6}, which is independent 



• Initialization: 

1. Letso = A T (AA T )"'x. 

2. Choose a suitable value for A as a function 
of a n . The value of <r n for a set of observed 
mixtures may be estimated either directly from 
the observed mixtures (see for example |9| 
and references therein) or using a bootstrap 
method (discussed in experiment 1 of Sec- 
tion |4). 

3. Choose a suitable decreasing sequence for a, 
[<7i . . . ok]- and a sufficiently small value for 
the step-size parameter, p. 

• For k = 1, . . . , K : 

1 . Let a = ah. 

2. Minimize (approximately) the function J CT (s) 
using L iterations of the steepest descent algo- 
rithm: 

- Initialization: s <— §k_i. 

- for j = 1 ... L (loop L times): 

(a) Let: As = A(2A T (As - x)) + 

(b) If J CT (s-/iAs) < J CT (s)letp= 1.2 
else p = 0.5. 

(c) Let s <— s — pAs 

(d) Let p <— p x p. (variable step-size) 

(e) Set Sk <— s. 

• Final answer is § = sk ■ 



Fig. 1. The final algorithm of SLODN. 



of A. Note that, there exists an interval in which the choice of A will 
result in a better estimation compared to SLO. The SNR takes its 
maximum in this region for some value of A, which we call A opt . 

As mentioned in the previous section, we expect an appropriate 
choice of A to be a decreasing function of cr n since with the increase 
of noise power, the cost of approximation (Ax ~ s) decreases. To 
verify this, for each value of a n , we obtained the value of A opt using 
the curves similar to Fig. [2] Figure [3] shows the values of A opt as 
a function of a n in [0,0.15]. We fit these results with a curve of 
type a+/3x 'i t0 And me following rule of thumb for the choice of 
parameter A: 



~ 0.007 + 3.5<t2 ' 

This formula gives a rough approximation for the choice of appro- 
priate A in the initialization step of the algorithm. 

Notice that we have two choices for the initialization of the pro- 
posed method: either to estimate a n directly from the observed mix- 
tures (9 ] and then use d 10b to find an approximation of A op t, or to 
follow this iterative approach to solve the problem: 

1. choose an arbitrary reasonable value of cr n . 

2. take A opt from the curve. 
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Fig. 2. Average Output SNR for different choices of A for a n =0.05. 
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Fig. 3. A op t as a function of noise power (<r n ). The continuous curve 
shows our approximation of A op t- 



3. run the algorithm and after convergence, compute an estima- 
tion of a n from the obtained source vector and then goto step 
2. 

Experiment 2. Speed and performance 

In order to measure the speed of our algorithm, we run the al- 
gorithm 100 times for m = 1000, n = 400 and a n = 0.05. The 
simulation is performed in MATLAB7 environment using an Intel 
2.8Ghz processor and 512MB of memory. The average run time of 
SLODN was 2.062 seconds while the average time for SLO was 0.242 
seconds. Although SLODN is somehow slower than SLO, but regard- 
ing to Table I in 1 8 1, the algorithm is still much faster than ^i-magic 
and FOCUSS. 

We proceed with the performance analysis of the proposed algo- 
rithm. In this experiment, we fix the parameters m,n,p with those 
of experiment 1 and for each value of cr n , choose the value of A with 
dlOb . In Fig.|4]the average output SNR is compared to the results of 
SLO. It can be seen that except for low-noise mixtures (<r n < 0.02), 
SLODN achieves a better SNR. Thus for noisy mixtures, the case for 
most real data, the act of approximately satisfying As = x con- 
straint is justified experimentally. 
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Fig. 4. Comparison between SLODN, SLO and BPDN. 
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Fig. 5. Average Output SNR versus m. Averages are taken over 100 
experiments. 



We also compared the results SLODN with Basis Pursuit De- 
Noising (BPDN) which is much faster than BP. We used Gradi- 
ent Projection for Sparse Reconstruction (GPSR) 1 10 1 algorithm for 
BPDN. The results of GPSR are shown in Fig. E] with dotted line. 
As we see, the average SNR curve of GPSR lies under the two other 
curves except for low noise mixtures. It worths mentioning that the 
average run time of GPSR was 3.156 seconds. 

Experiment 3. Dimension Dependency 

In this experiment we study the performance of the proposed 
method for different dimensions of sources and mixtures. In this 
experiment, the values of m and n change within a constant ratio 
(n — 0.4m). The average output SNR for both methods are shown 
in Fig[5] The results suggest that the quality of estimation is almost 
independent of the dimensions. 



The proposed method was based on smoothed /?°-norm 
minimization and satisfying the equality constraint approximately 
instead of exact equality constraint. The proposed method is fast 
while being more robust against noisy mixtures than the original 
SLO. Experimental results approved the performance and the noise- 
tolerance of our method for noisy mixtures. 
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5. CONCLUSION 

We presented a fast method for Sparse Component Analysis (SCA) 
or atomic decomposition on over-complete dictionaries, in presence 
of additive noise. The method was a generalization of SLO method. 



